-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Optimize cudf.concat
for axis=0
#9222
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #9222 +/- ##
===============================================
Coverage ? 10.81%
===============================================
Files ? 115
Lines ? 19170
Branches ? 0
===============================================
Hits ? 2074
Misses ? 17096
Partials ? 0 Continue to review full report at Codecov.
|
elif are_all_range_index and not ignore_index: | ||
out._index = cudf.core.index.GenericIndex._concat( | ||
[o._index for o in objs] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case not included in line 1218?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, that line's expectation is to have the index columns materialized. Whereas we don't want to materialize and hit the specialized concat rangeindex logic already present in index.py:
https://github.com/rapidsai/cudf/blob/branch-21.10/python/cudf/cudf/core/index.py#L688-L702
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially we want to hit _concat_range_index
in this case:
cudf/python/cudf/cudf/core/index.py
Line 2311 in eab2486
def _concat_range_index(indexes: List[RangeIndex]) -> BaseIndex: |
@gpucibot merge |
This PR optimizes
cudf.concat
whenaxis=0
by not materializingRangeIndex
objects present as index to theDataframe
objects.Partially addresses #9200, This is 1/2 of full optimizations. A follow-up PR to optimize
axis=1
will be opened as there are multiple large changes.Here is a benchmark:
On
branch-21.10
:This PR: